some initial tweaks of the input format #132

superwhiskers · 2025-06-06T20:24:42Z

description

this pull request proposes some tweaks to the input format itself, and to the language of the specification

proposed changes

here's a quick summary of the changes made:

changes from the nested heterogeneous lists in the initial spec to lists of objects
tightening and clarification of language
removal of redundant parts of the specification

further comments

on line 118, there is a note about adding the variables monitoring is supported for in each device class to the device class table (as is with the nodes, as far as i know). this should be done and the comment should be removed before this is merged.
we should support messagepack in the gridkit codebase. this is noted at the top of the spec but i'd like to discuss it prior to merging and at what point to attempt implementing it (and how).
i'm not an electrical engineer so i may have messed up some aspects of the language. this should be reviewed (or maybe reverted? most of my edits were made with the intent of making aspects of the language clearer but this may not be necessary)
can we remove the case_ prefix from case_name, case_description, case_comments in the header section?
i removed the case_date field from the header as it seems to me that this role could be fulfilled by either case_description or case_comments, but was this assumption incorrect?
i haven't touched the output file format section of the specification. i'd like to discuss if the role of this could be filled better with a utf8 encoded json schema or something else prior to reworking it.
anything else that stands out as wrong or unclear.

pelesh

This looks good to me, but I'd let @abirchfield decide whether to merge this.

As an additional comment, it would be good to add another appendix with JSON style guidelines for input files.

src/Model/PhasorDynamics/INPUT_FORMAT.md

abirchfield · 2025-06-07T16:31:24Z

One of the goals of the format is that it will be non-repetitive to keep file size concise and enhance human readability. Notice that, with the proposed changes, the demo file has ballooned to about double its prior size. Imagine if we have 2000 buses, and each of them has the identical literal strings "id", "class", "name", "default_voltages", "voltage_base" rather than just an ordered list with known required items. Same thing if we have 300 GENROUs. We will end up with massive files with lots of repetitive text that are neither efficient for humans nor computers to read. It is a stylistic choice but I prefer the original.

superwhiskers · 2025-06-08T00:07:26Z

json parsers these days are quite fast, i don't think there would be any significant performance overhead to this. the main reason i think objects over arrays for device & node descriptions is the way to go is that they're more self-describing compared to the heterogeneous arrays used previously.

additionally, another change i made was pulling out the optional fields we had already into the object we now use for describing these. this removes our specified optional fields from the same namespace that other consumers of this format would use to implement their own extensions, freeing us from ensuring new optional fields we add don't have names that conflict with those used by others. this is also beneficial, and not something that could've been done with the previous approach unless we had two separate objects in the array: one for us and the other for extensions.

finally, the use of an object over a heterogeneous array is just more in line with people's expectations, but this may not be that convincing of an argument.

Co-authored-by: pelesh <peless@ornl.gov>

abirchfield · 2025-06-09T13:46:12Z

I committed some changes. I think I am persuaded to use more of an object-oriented format rather than tables. In going through it, I realized some of the names were not consistent with nomenclature decisions we made since the original draft of the format. @superwhiskers feel free to take another pass through.

superwhiskers · 2025-06-09T18:18:26Z

so there's one major comment here. in the initial pull request, i removed the case_date for this reason (mentioned in the initial comment):

i removed the case_date field from the header as it seems to me that this role could be fulfilled by either case_description or case_comments, but was this assumption incorrect?

i notice you've added it back. was my assumption incorrect?

superwhiskers · 2025-06-09T18:25:17Z

i'd also like to start some discussion of the output format. aside from csv, some other alternatives that would work are json lines or perhaps the same thing with messagepack.

this may not be the best choice if streaming output is not a concern we have (the primary benefit of such formats would be the ability to easily stream output from one area to another) but they also provide the benefits of json and messagepack, which is that they are better specified than something like csv, but issues with that may not come up for our use case.

abirchfield · 2025-06-09T18:29:09Z

It is typical and convenient to have a separate field in a case file for the case date.

abirchfield · 2025-06-09T18:35:45Z

What this would look like in JSON lines? Keep in mind we may have 100k or more time points for some studies. I would not recommend repeating all the output channel names for each time point. I guess the simplest would be just to add a [ and ] to the beginning and end of each of the lines in my example, so that every line is an array representing a time point, except the first line is an array with channel labels.

abirchfield · 2025-06-09T18:37:20Z

Of course the first thing I would do for my own workflow is write a script to convert it to CSV so I can open it in Excel...

superwhiskers · 2025-06-09T18:43:18Z

ah. if your main use case is opening it in excel then sticking with csv would probably be the better choice. json lines or something like that is probably excessive and wouldn't be of any benefit to you. the only reason i bring it up is that i've used it in the past for some simulation work and retaining the output channel names makes every single pipeline slightly more self-describing (the size increase is negligible if the json is compressed). additionally, there are many tools for working with json.

…o wording

superwhiskers · 2025-06-09T18:54:38Z

i've explicitly specified a date format (YYYY-MM-DD) so that there's no confusion about the date being represented by a case with this field and marked the field as optional as it wasn't used in the example and it may not always be applicable. i've also removed the "uncommon" label applied to some fields as it wasn't clear to me what that was there to indicate. suggesting that such a field overrides the default value seemed enough to me.

the remaining things i'd like to complete before this is ready to merge / ready for final review are:

clarify the format of the output csv and adjust the language
explicitly specify default values for the init field of the bus objects when they are not given (this could be added to the bus class table)

abirchfield · 2025-06-09T19:16:53Z

Generators commonly specify data on their own power base, overriding the system default. Very rarely would a device override the frequency base, but it could.

abirchfield · 2025-06-09T19:20:36Z

User should be able to specify time as well as date. Generally this could be written automatically by a program writing the file.

superwhiskers · 2025-06-09T19:22:06Z

would it not be more appropriate to call it case_datetime then, or something along those lines? i'll specify that an iso 8601 datetime (with or without timezone) as well as just a date is allowed, then.

abirchfield · 2025-06-09T19:24:40Z

The output CSV format is just an example. If you want to add any specifics go ahead.

abirchfield · 2025-06-09T19:29:39Z

For default values of each of the bus classes, it will be 1.0 for Vr and Va, and it will be 0.0 for all other variables. This is probably easier said in a note below the table rather than cluttering the table.

…_datetime` field

superwhiskers · 2025-06-09T20:29:00Z

my only confusion regarding the existing output format is the meaning of the "solver status" column. it's not clear to me how portable such a field would be. it may be better to just leave that for another output channel (that is, not bundling it into the same file containing samples of all of the monitor channels).

abirchfield · 2025-06-09T20:34:25Z

I would prefer all the output be together. Maybe the solution is to add an optional "mon" field to the header section, for system-level outputs. This could include solver status, but could also include various aggregate measurements. Time, of course, would still always be the first channel.

superwhiskers · 2025-06-09T21:52:13Z

how do you feel about a json lines derived output format like this? each timestep would be like this:

{"time": 0, "solver": { /* solver-dependent object */ }, "buses": [/* bus objects */], "devices": [/*device objects*/]}

the solver object contains status arbitrary status information dependent upon the solver used (as this kind of state can vary between solvers). the bus objects are as follows:

{"number": 0, "monitored": {"variable": /* state */}}

the device objects are as follows:

{"ports": /* same as how they are specified */, "id": /* some value */, "monitored": {/* same as above*/}}

this is a quick sketch. i decided to move back to json lines as i couldn't think of a single good way to render this information down to a csv without recreating all of the niceties of a format like json. suggest any changes you think would improve it.

abirchfield · 2025-06-10T13:39:48Z

The basic idea for the output is: you define what variables you want to monitor and the program exports a matrix of those variables across time. CSV is most useful for that, so I at least want a CSV output option. We can add additional, more complicated output formats in the future if they will be useful for anyone.

superwhiskers · 2025-06-10T13:43:52Z

absolutely. my main thought here was that it didn't seem like i could make a single csv-derived format that would fit most use cases so i propose that json lines format and a tool (a python script or something) that converts it to a csv the way you'd want it as proposing a csv like what you were thinking of seems too specific to make sense proposing as a "standard" output format (i'm mainly thinking of the solver data here).

alternatively, we just cut the output format from this specification and develop one outside of it before adding one to this spec.

superwhiskers · 2025-06-10T20:15:51Z

we may need to make a separate pull request then to make a tweak or two to the output format as i don't know how well it'd work as-is (primarily talking about the solver status column still).

some initial tweaks of the input format

647ed01

superwhiskers requested review from abirchfield and pelesh June 6, 2025 20:24

superwhiskers self-assigned this Jun 6, 2025

pelesh reviewed Jun 6, 2025

View reviewed changes

src/Model/PhasorDynamics/INPUT_FORMAT.md Outdated Show resolved Hide resolved

superwhiskers and others added 2 commits June 7, 2025 19:09

update src/Model/PhasorDynamics/INPUT_FORMAT.md

715042d

Co-authored-by: pelesh <peless@ornl.gov>

Aligning with new field labeling paradigm and bus vs node notation

1a2251e

explicitly specify a date format for consistency, minor adjustments t…

18268ab

…o wording

superwhiskers added 2 commits June 9, 2025 15:50

indicate default values for bus classes, allow datetimes in the `case…

fab29e1

…_datetime` field

refine language regarding the datetime field

1c42553

abirchfield approved these changes Jun 10, 2025

View reviewed changes

pelesh merged commit f65eb4f into develop Jun 10, 2025
4 checks passed

some initial tweaks of the input format #132

some initial tweaks of the input format #132

Uh oh!

Conversation

superwhiskers commented Jun 6, 2025

description

proposed changes

further comments

Uh oh!

pelesh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abirchfield commented Jun 7, 2025

Uh oh!

superwhiskers commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

superwhiskers commented Jun 9, 2025

Uh oh!

superwhiskers commented Jun 9, 2025

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

superwhiskers commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

superwhiskers commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

superwhiskers commented Jun 9, 2025

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

superwhiskers commented Jun 9, 2025

Uh oh!

abirchfield commented Jun 9, 2025

Uh oh!

superwhiskers commented Jun 9, 2025

Uh oh!

abirchfield commented Jun 10, 2025

Uh oh!

superwhiskers commented Jun 10, 2025

Uh oh!

Uh oh!

superwhiskers commented Jun 10, 2025

Uh oh!

Uh oh!

superwhiskers commented Jun 8, 2025 •

edited

Loading

superwhiskers commented Jun 9, 2025 •

edited

Loading

superwhiskers commented Jun 9, 2025 •

edited

Loading